The following dataset is a heavily cleaned up version of the official data on slums from the Government's Rajiv Awas Yojana Scheme's Slum Free City Action Plan of 2011. This dataset is the only geocoded data for slums in India. While the quality of data could be questionable, it still provides an insightful understanding of the spatial patterns of poverty in Indian cities.
In [16]:
#Starting out the basics.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
In [17]:
slums= pd.read_csv("hyderabad_slum_master.csv")
In [18]:
slums.head() #The dataset is a spatialised one, hence the_geom column.
Out[18]:
Now let's look at the columns in the database.
In [19]:
slums.columns
Out[19]:
So there are the following things in the dataset.
Name of the Slums
The Municipal Circle and Ward Number
The number of Households and Population slumwise.
The number of Households that are Below Poverty Line (BPL) - It means there are people in slums who are not poor as defined by the government's definition. This becomes crucial because - it is often assumed that the slum is a proxy for urban poverty. Gautam Bhan and Arindam Jana's paper argues otherwise (http://epw.yodasoft.com/journal/2015/22/review-urban-affairs-review-issues/reading-spatial-inequality-urban-india.html)
Caste Parameters : In Indian cities, Caste is an important index. The households that are "General", "Scheduled Caste", "Scheduled Tribe" or "Other Backward Castes" are all counted and have their own columns.
Tenure Parameters: Because the dataset is primarily about eliminating slums with the idea of providing Government built houses, the land tenure status of the slum households is essential to know. They are Patta, Possession Certificate, Slums on Private Land, Slums on Public Land, Households that are renters and Other tenures that do not fit under this framework.
Structure of the House : This pertains to what kind of a house the slum dweller lives on. Pucca means a stable house made out of concrete, Semipucca could mean a not so stable house and Kuccha house means a house made out of mud/wood/etc.
Average Monthly Income, Expenditure and Debt.
The number of years have the dwellers stayed in the slums
Number of Female Headed Households.
In [20]:
totalpopulation=slums['population'].sum()
print("The total number of people who live in slums in Hyderabad are",int(totalpopulation))
print("")
populationofhyderabad=6731790 #According to the 2011 census.
percentageofpopulation=(totalpopulation/populationofhyderabad)*100
print("The percentage of people who live in slums is",(percentageofpopulation))
A quarter of the city's population approximately lives in slums.
In [21]:
print("The total number of slum households is",slums['households'].sum())
print("")
print("The average number of households in a slum is",slums['households'].mean())
print("")
print("The average family size is", (slums['population'].sum()/slums['households'].sum()))
Circle here refers to a Municipal Subdivision. The city is divided into 18 circles, with each circle being further subdivided into wards. The city has 150 of these wards.
In [22]:
circle_count=slums['circle_number'].value_counts()
circle_count
Out[22]:
In [23]:
plt.style.use('ggplot')
circle_count.plot(kind='bar',x='Circle Number',y='Number of Slums',legend=False, figsize=(10,10))
Out[23]:
Circle 4 - which is the old city quarter of Hyderabad has the maximum number of slums in terms of numbers.
In [24]:
slums['ward_number'].value_counts().head(20)
Out[24]:
Because the data is faulty, there are slums whose ward numbers is empty, ie 0.The remainder of the list are the ward numbers with maximum slums and on the right are the number of slums in them. That makes ward 108 have the maximum number of slums. On introspecting on a map(in CartoDB) - one is curious to know why is it that this ward has a highly fragmented number of slums, often not more than a couple of houses.
We count the total number of literates in each slum and compare it with the total population.
In [25]:
print("The total percentage of literacy in slums is",(slums['literacy_literates'].sum())/(slums['population'].sum())*100)
As argued above, there are valid concerns that the slum might not be a great proxy for urban powerty. But if the percentage of households that are Below Poverty Line is high enough, then the slum can be considered as a proxy for urban poverty.
In [26]:
(slums['number_of_bpl_households'].sum())/(slums['households'].sum())*100
Out[26]:
That is high enough for us to consider the slum as a decent proxy for urban poverty in Hyderabad.
Now, we create new columns in the dataframe using the existing columns and check.
In [27]:
slums['percentageofliterates']=slums['literacy_literates']/slums['population']*100
In [28]:
slums['percentageofliterates'].mean()
Out[28]:
On closer introspection, it is found that the other datasets are per household and not population wise like tha literacy percentage. So we do the addition of the following columns into the dataframe. So we repeat the above calculation for percentage of households below poverty line.
In [29]:
slums['percentageofbplhouseholds']=slums['number_of_bpl_households']/slums['households']*100
Now we repeat this for the Caste Parameters.
In [30]:
slums['percentageofgeneral'] = slums['caste_general']/slums['households']*100
slums['percentageofminority']= slums['minority']/slums['households']*100
slums['percentageofobc'] = slums['caste_obc']/slums['households']*100
slums['percentageofsc'] = slums['caste_sc']/slums['households']*100
slums['percentageofst'] = slums['caste_st']/slums['households']*100
And for the type of structures..
In [31]:
slums['percentageofpuccastructures'] = slums['structure_pucca']/slums['households']*100
slums['percentageofsemipuccastructures'] = slums['structure_semipucca']/slums['households']*100
slums['percentageofkucchastructures'] = slums['structure_kuccha']/slums['households']*100
.. And for the kind of tenure
In [32]:
slums['percentageoftenure_patta'] = slums['tenure_patta']/slums['households']*100
slums['percentageoftenure_pc'] = slums['tenure_possession_certificate']/slums['households']*100
slums['percentageoftenure_private'] = slums['tenure_private_land']/slums['households']*100
slums['percentageoftenure_public'] = slums['tenure_public']/slums['households']*100
slums['percentageoftenure_renters'] = slums['tenure_rented']/slums['households']*100
slums['percentageoftenure_other'] = slums['tenure_others']/slums['households']*100
and finally for the number of years the residents have stayed..
In [33]:
slums['percentageoftenure_0to1'] = slums['zerotoone_years_of_stay']/slums['households']*100
slums['percentageoftenure_1to3'] = slums['onetothree_years_of_stay']/slums['households']*100
slums['percentageoftenure_3to5'] = slums['threetofive_years_of_stay']/slums['households']*100
slums['percentageoftenure_morethan5'] = slums['morethanfive_years_of_stay']/slums['households']*100
Now, let us see how the slums dataframe looks like
In [34]:
slums.head()
Out[34]:
In [35]:
slums.columns
Out[35]:
We now have the data with all the calculations necessary made for analysis and now we begin.
In [36]:
slums.sort_values(by='percentageofgeneral').head(5)
Out[36]:
In [37]:
slums.sort_values(by='percentageofsc').head(5)
Out[37]:
In [38]:
slums.sort_values(by='percentageofst').head(5)
Out[38]:
In [39]:
slums.sort_values(by='percentageofobc').head(5)
Out[39]:
Now, we see what percentage of each of these castes live in slums.
In [48]:
a=slums['percentageofgeneral'].mean()
b=slums['percentageofsc'].mean()
c=slums['percentageofst'].mean()
d=slums['percentageofobc'].mean()
print(a,b,c,d)
In [49]:
plt.style.use('ggplot')
series=pd.Series([a,b,c,d], index=['General', 'SC', 'ST','OBC'], name='Caste Percentages')
series.plot.pie(figsize=(6, 6))
Out[49]:
In [50]:
a=slums['percentageofpuccastructures'].mean()
In [51]:
b=slums['percentageofsemipuccastructures'].mean()
In [52]:
c=slums['percentageofkucchastructures'].mean()
In [53]:
plt.style.use('ggplot')
series=pd.Series([a,b,c], index=['Pucca Houses', 'Semipucca Houses', 'Kuccha Houses'], name='Household Structure Type')
series.plot.pie(figsize=(6, 6))
Out[53]:
Kaccha houses.
In [54]:
a=slums['percentageoftenure_patta'].mean()
In [55]:
b=slums['percentageoftenure_pc'].mean()
In [56]:
c=slums['percentageoftenure_private'].mean()
In [57]:
d=slums['percentageoftenure_public'].mean()
In [58]:
e=slums['percentageoftenure_renters'].mean()
In [59]:
f=slums['percentageoftenure_other'].mean()
In [60]:
plt.style.use('ggplot')
series=pd.Series([a,b,c,d,e,f], index=['Pattas', 'PC', 'Private Land','Public Land','Renters','Others'], name='Tenuree Type')
series.plot.pie(figsize=(6, 6))
Out[60]:
In [61]:
a=slums['percentageoftenure_0to1'].mean()
Out[61]:
In [62]:
b=slums['percentageoftenure_1to3'].mean()
Out[62]:
In [63]:
c=slums['percentageoftenure_3to5'].mean()
Out[63]:
In [64]:
d=slums['percentageoftenure_morethan5'].mean()
Out[64]:
In [67]:
plt.style.use('ggplot')
series=pd.Series([a,b,c,d], index=['Zero to One Years', 'One to Three Years', 'Three to Five Years','More than Five Years'])
series.plot.pie(figsize=(6, 6))
Out[67]:
In [212]:
slums['avg_monthly_income'].mean()
Out[212]:
In [213]:
slums['avg_monthly_expenditure'].mean()
Out[213]:
In [211]:
slums['debts_outstanding'].mean()
Out[211]: